Skip to content

Conversation

@kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Jun 9, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

RFC: https://www.notion.so/risingwave-labs/Arrangement-Backfill-31d4d0044726451c9cc77dc84a0f8bb2

We implement the arrangement backfill executor in this PR.
Integration tests are difficult to write for it, since CreateMviewProgress needs to be mocked as well, which does not seem trivial to me.

Since this is not used yet, I think it is fine to just do a code review of the corresponding changes to the code base before merging. I will defer frontend changes to later to avoid too large a PR.

It will be tested later when frontend support is added too. When that happens I will test it e2e with existing e2e test + adding some scale and recovery tests specific to arrangement backfill.

It will also be benchmarked against backfill then.

Comparison to backfill:

  • StateTable is needed for replication. So it uses StateTable instead of StorageTable. Later frontend will need to take care of instantiating the table, making sure it is of type is_replicated. We should add an assertion for ArrangementBackfill::new when that's done.
  • When we receive barrier, we have to flush the changes to state_table, to ensure it is replicated, before any new snapshot read.
  • Because this involves a write to StateTable, this also means we have to break out of the backfill stream loop first. This is so we drop the immutable reference to StateTable, which is used to read snapshot.
  • As a result we shift the entire barrier processing logic to outside the backfill stream loop.
  • It makes the logic a little convoluted, I welcome ideas to simplify it further. Refactoring it into a function (process_barrier) does not seem ideal as well, since it covers many variables >= 10.

Changes to backfill state persistence:

  • In this PR we track progress per vnode.
  • This is so we can support scaling for it, which is the main usecase (independently scale backfill with upstream mv).

Changes to snapshot read:

  • Because backfill could now have mix of states as a result of scaling (some vnodes complete, others in progress / not-started)
  • This means our snapshot read logic will skip completed vnodes, start from different offsets / no offset for in-progress and not-started vnodes respectively.

Changes to update processing (cr @hzxa21):

  • Skip a row if it belongs to a not-yet-scanned vnode or a scanning vnode with smaller current_pos
  • Emit a row to downstream if it belongs to a fully-scanned vnode.
  • Emit a row to downstream if it belongs to a scanning vnode with larger current_pos

Other refactoring:

  • Reuse interfaces from backfill, so we refactor backfill.rs -> backfill/no_shuffle_backfill.rs
  • and common utilities to backfill/utils.rs
Here are further changes to existing interfaces (OUTDATED, IGNORE THIS SECTION):
  • These changes are mainly for state table to support reads (previously we use interface of StorageTable, which already supported our read pattern). Our read pattern is different, so we need additional interfaces.
  • Implement state table interface for iterating through state_table by vnodes.
  • This is supported by the iter_all_vnodes_with_pk_range.
  • Support ordered iteration of rows. This is done by duplicating merge_sort implementation from batch. Ideally I would like to reuse the original merge_sort there, but there are some difficulties as described in the docstring.
  • Also support iter by chunk, via collect_data_chunk. I managed to refactor it so it could be reused, but can't quite keep this method in a trait, due to orphan instance rules.
  • I kept the TableIter interface, since several locations use its next_row method still. It can be removed next time.

Regarding persisting arrangement backfill state:

  • Currently I keep the logic from backfill used to persist the state.
  • This is because it seems to require minimal logic (just need to update pk_in_output_indices) to maintain it.
  • If it causes trouble after arrangement_backfill is implemented e2e, I will remove it.
  • Otherwise it makes sense to keep it, since it will be much harder to add back later.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@github-actions github-actions bot added the type/feature Type: New feature. label Jun 9, 2023
@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Jun 15, 2023
@kwannoel kwannoel changed the title feat(streaming): arrangement backfill feat(streaming): arrangement backfill [IGNORE ME, WIP] Jun 15, 2023
@kwannoel kwannoel marked this pull request as ready for review June 15, 2023 08:34
@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #10266 (9cc6ad2) into main (5443a1e) will decrease coverage by 0.09%.
The diff coverage is 13.75%.

@@            Coverage Diff             @@
##             main   #10266      +/-   ##
==========================================
- Coverage   70.34%   70.26%   -0.09%     
==========================================
  Files        1274     1276       +2     
  Lines      219041   219308     +267     
==========================================
+ Hits       154085   154097      +12     
- Misses      64956    65211     +255     
Flag Coverage Δ
rust 70.26% <13.75%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/row_seq_scan.rs 21.49% <ø> (ø)
src/storage/src/table/batch_table/storage_table.rs 86.81% <0.00%> (ø)
...ream/src/executor/backfill/arrangement_backfill.rs 0.00% <0.00%> (ø)
...tream/src/executor/backfill/no_shuffle_backfill.rs 0.00% <0.00%> (ø)
src/stream/src/executor/backfill/utils.rs 0.00% <0.00%> (ø)
src/stream/src/executor/batch_query.rs 82.22% <ø> (ø)
src/stream/src/executor/mod.rs 51.92% <ø> (ø)
src/stream/src/common/table/state_table.rs 88.53% <75.00%> (-0.22%) ⬇️
src/storage/src/table/merge_sort.rs 88.67% <100.00%> (ø)
src/storage/src/table/mod.rs 83.01% <100.00%> (+1.01%) ⬆️
... and 1 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill branch from 83c4e01 to 16ece5b Compare June 16, 2023 09:37
@kwannoel
Copy link
Contributor Author

kwannoel commented Jul 4, 2023

Have updated the implementation, it should match the proposal from @hzxa21 now: #10266 (comment).

Will leave some comments on certain parts which may need further review.

Will do a second pass tomorrow as well.

@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Jul 4, 2023
@kwannoel kwannoel removed the user-facing-changes Contains changes that are visible to users label Jul 4, 2023
@kwannoel kwannoel marked this pull request as ready for review July 4, 2023 16:50
@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Jul 4, 2023
@chenzl25
Copy link
Contributor

chenzl25 commented Jul 6, 2023

The new snapshot_read_per_vnode LGTM.

@kwannoel kwannoel requested a review from hzxa21 July 7, 2023 04:36
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Thanks for the PR! Left some comments.

Rest LGTM!

debug_assert!(!state_table.vnode_bitmap().is_empty());
let vnodes = state_table.vnodes().iter_vnodes();
let mut result = Vec::with_capacity(state_table.vnodes().len());
for vnode in vnodes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since get_progress_per_vnode is called when we need to init the backfill state, cache of state table is highly likely to be empty and each get_row will be an actual S3 I/O. How about using futures::future::try_join_all to issue state_table.get_row concurrently to reduce overall latency?

Copy link
Contributor Author

@kwannoel kwannoel Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed rust compiler be able perform each get_row concurrently, even within a for loop. Guess I should build a small toy program to test it out.

I will follow your suggestion to be safe.

EDIT: Thinking about it again, I guess if we leave it to rust runtime, it may not schedule all get_row concurrently, e.g. stagger them? So try_join_all will enforce it.

Copy link
Contributor Author

@kwannoel kwannoel Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 9cc6ad2

.iter()
.all(|(_, p)| *p == BackfillProgressPerVnode::Completed);
if is_completely_finished {
assert!(!first_barrier.is_newly_added(self.actor_id));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this also true when is_completely_finished == false?

Copy link
Contributor Author

@kwannoel kwannoel Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If is_completely_finished == false, this assertion can be false as well.
For instance if all progress is BackfillProgressPerVnode::NotStarted. Which could mean it is from new streaming graph, and this assertion will be false.

@kwannoel kwannoel requested a review from hzxa21 July 7, 2023 18:02
@kwannoel kwannoel added this pull request to the merge queue Jul 10, 2023
Merged via the queue into main with commit 019e7b4 Jul 10, 2023
@kwannoel kwannoel deleted the kwannoel/arrangement-backfill branch July 10, 2023 03:34
@kwannoel kwannoel removed the user-facing-changes Contains changes that are visible to users label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature Type: New feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants